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We report results showing that thermaUy-induced openings of double stranded DNA coincide with 
the location of functionally relevant sites for transcription. Investigating both viral and bacterial 
DNA gene promoter segments, we found that the most probable opening occurs at the transcription 
start site. Minor openings appear to be related to other regulatory sites. Our results suggest that 
coherent thermal fluctuations play an important role in the initiation of transcription. Essential 
elements of the dynamics, in addition to sequence specificity, are nonlinearity and entropy, provided 
by local base-pair constraints. 
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One of the most challenging subjects in biophysics is 
the relation between biomolecular motions and function 
. We present an example suggesting that functionality 
arises from structurally coherent dynamics, with essen- 
tial ingredients of: sequence-specificity, nonlinearity and 
entropy; the nonlinearity form local constraints is crucial. 
In particular, remarkably successful comparisons of nu- 
merical simulations of a minimal model (see below) of 
transverse dynamics for gene promoter DNA segments 
with in vitro transcriptional experiments, shows that the 
combination of all the above mentioned components con- 
trols coherent "bubble" fluctuational openings of base- 
pairs around specific sites of promoter DNA. Remarkably, 
the prominent opening occurs at the transcription initia- 
tion site, while minor openings coincide with regulatory 
sites at which transcription factors and other assisting 
proteins are bound. These results demonstrate the im- 
portance of the sequence structure, not simply as a static 
and passive element, but to provide the template for spe- 
cific coherent fluctuations determining function. These 
coherent structures constitute a colored spatio-temporal 
stochastic environment. This is an example of the im- 
portance of a (dynamic) landscape of substates . 

We have used a microscopic model proposed by 
Peyrard and Bishop to describe the dynamics of the 
openings of double stranded DNA. This model focuses 
only on the most relevant degrees of freedom, namely the 
transverse stretching of the hydrogen bonds connecting 
complementary bases in the opposite strands of the dou- 
ble helix. Its reduced character, involving a small number 
of variables, makes it suitable for simulations over rela- 
tively long times and appropriate for gathering sufficient 
statistics. Subsequent key improvements [3[ succeeded 
in reproducing the abrupt (first order) character of the 
observed DNA denaturation transition. The potential 
energy of this model reads 
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Here the sum is over all the base-pairs of the DNA and ?/„ 
denotes the displacement from the equilibrium position 
of the relative distance between the bases within the ri*'* 
base-pair, divided over V2- The Morse potential (other 
similar potentials can also be used) in the first term pro- 
vides the effective interactions between complementary 
bases; it represents both the attraction due to the hy- 
drogen bonds forming the base-pairs and the repulsion 
of the negatively charged phosphates in the backbone 
of the two strands screened by the surrounding solvent. 
Beyond the exact details of this interaction, an impor- 
tant issue is a correct description of the nonlinearities. 
The parameters £)„ and a„ of the on-site potential dis- 
tinguish between the two possible combinations of bases, 
i.e. adenine-thymine (A-T) or guanine-cytosine (G-C), at 
site n, depending on the particular sequence. The second 
term in the total potential energy represents the stacking 
interaction potential between adjacent base-pairs. Here 
the nonlinear inter-site coupling, given by the exponential 
term that effectively modifies a harmonic spring constant, 
is essential for representing local constrains in nucleotide 
motions, which result in long-range cooperative effects 
3j. As in elastic materials 0,13) it controls lattice vibra- 
tions, yielding accurate entropic terms the stiffening 
of the coupling in the compact state compared to that in 
the open state leads to an abrupt entropy-driven transi- 
tion Physically, the constraint describes the change 
of the next-neighbor stacking interaction due to the dis- 
tortion of the hydrogen bonds connecting a base-pair, 
mediated through the redistribution of the electrons on 



(a) P5 Sequence: 

(-45) 



(-20) 



5' -GTGGC CATTTAGGGT ATATATGGCC GAGTGAGCGA 

GCAGGATCTC CATTTTGACC GCGAAATTTG AACG- 3' 
(+1) ( + 24) 

(b) AdMLP Sequence: 

(-62) (-40) 
5' -GC CACGTGACCA GGGGTCCCCG CCGGGGGGGT ATAAAAGGGG 

GCGGACCTCT GTTCGTCCTC ACTGTCTTCC GGATCGCTGT CCAG- 3' 

(+1) (+24) 

(c) T7 Sequence: 

(-40) (-20) 
5' -ATGACCAGTT GAAGGACTGG AAGTAATACG 

ACTCA GTATA GGGACAATGC TTAAGGTCGC TCTCTAGGAG- 3' 
(+1) (+30) 



FIG. 1: Base-pair sequences of the studied DNA gene promoter fragments, (a) 69 base-pair long viral P5 promoter, (b) 86 
base-pair long viral AdMLP promoter, and (c) 70 base-pair long bacterial T7 promoter. 



the corresponding bases. 
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Despite its simplified character, this model has suc- 
cessfully reproduced not only the sharp melting (denatu- 
ration) transition occurring when the two strands of the 
DNA separate from each other as temperature increases, 
but also the details of the precursor (nucleation) fluctua- 
tional openings and the dynamics upon approaching the 
denaturation transition. The coexistence of two essential 
features are necessary for obtaining the first-order— like 
transition 0: the nonlinear coupling constant that de- 
creases in the denaturated phase providing an increase 
in entropy, and a plateau in the on-site potential which 
should not increase unbounded. Regarding the precursor 
fluctuations, intrinsic localized modes nucleate as non- 
linear bubble opening events that subsequently interact 
and grow, providing the experimentally observed denat- 
uration bubbles This nucleation regime, precursor 
to the melting, extends over temperatures several tens 
of Kelvin below the melting transition (the biologically 
relevant regime). 

In addition to capturing the essential features of ther- 
mally induced denaturation of long DNA chains, the 
model has been used to reproduce the melting curves 
of very short heterogeneous DNA segments, in excellent 
quantitative agreement with experimental data . Fur- 
thermore, it provides the characteristic multi-step melt- 
ing observed in single heterogeneous DNA molecules ^3 ■ 
Recently, the model has been used to investigate charge 
transport properties in a flexible DNA chain, where the 
charge is coupled to the lattice degrees of freedom 
The bubbles, as relatively long-lived intrinsic inhomo- 
geneities ( "hot-spots" ) jl3 , represent a colored noise en- 
vironment, which qualitatively influences charge dynam- 



ics [l^ . 

Motivated by the successful description of the nonlin- 
ear thermal fluctuations, we have applied this model to 
explore the possible role of the intrinsic bubble openings 
for the transcriptional initiation and regulatory sites of 
specific promoter DNA sequences. In particular, we have 
studied the adenoassociated viral P5 promoter (P5), the 
adenovirus major late promoter (AdMLP) and the bac- 
teriophage T7 core promoter (T7). The base-pair se- 
quences of these promoters are presented in Fig. 1. In 
vitro transcription experiments demonstrating the spe- 
cific initiation of RNA polymerase II transcription from 
DNA templates containing the corresponding promoter 
fragments are shown in Fig. 2. For the experimental de- 
tails see reference Q. 

We performed Langevin molecular dynamics (thereby 
capturing thermal fluctuation and dissipation effects) for 
nucleotides of mass m evolving in the potential V of equa- 
tion (Q. We have used the parameter values given in 
reference 0]: A: = 0.025 eV/A^, p = 2, ^ = 0.35 A-^ 
for the inter-site coupling, while for the Morse potential 
Dgc = 0.075 eV, aac = 6.9 for a GC base-pair, 
and Dat = 0.05 eV , qat = 4.2 A~'^ for an AT pair. 
The simulated temperature was 300 K (below the melt- 
ing temperature but in the precursor regime of bubble 
formation) . 

The statistics of the thermally induced openings was 
obtained using 100 different realizations for each DNA 
sequence studied. We ran each realization for 1 ns, after 
reaching thermal equilibrium, and monitored the state of 
the system every 1 fs. Thus we have 10^ events for each 
one of the 100 realizations. At every event we checked the 
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(a) 



(b) 



-30 -20 -10 +1 +11 +21 

Base— pair sequence 



FIG. 2: Auto-radiography of [^^P]-labeled reverse transcripts 
after separation by gel electrophoresis (lanes 3). Arrows in- 
dicate the direction of specific transcription started from the 
initiation site -1-1 in all cases. Lanes 1 indicate base position 
markers obtained by chemical sequencing, (a) P5 promoter, 
(b) AdMLP promoter, and (c) T7 promoter. Lane 4 in (a) 
shows elimination of the transcription for the mutated P5 pro- 
moter, where the nucleotides at -\-2 and -|-3 have been changed 
from AT to GC. 



displacements ?/„ of the base-pairs at each site n and the 
following m — 1 (m varying from 1 to 19) base-pairs. If 
the openings at all these m subsequent sites are greater 
than a threshold value yth (varying from one tenth to 
few A) we assign a contribution to the opening event at 
the n*'' base-pair of the sequence. The obtained opening 
probabilities along the studied DNA segments for bubble 
sizes of m = 10 base-pairs and thresholds yth = 1.0 and 
1.5 A for accepting an opening are presented in Fig. 3. 
(We recall that the real openings are equal to In 
all these cases the most probable openings are located at 
the transcription start site -f-1 (see Fig. 2). Furthermore, 
in the viral cases the other distinct openings seem to 
related to known regulatory sites: in P5 the opening at 
the A/T rich region between -40 and -35 corresponds 
to the binding site of the transcription factor Yin Yang 
1 while in AdMLP the second higher opening is 

close to the binding site of the TATA-box binding protein 
|l6^, that is necessary for transcription. As can be seen 
from Fig. 3, openings of such large widths and amplitudes 
are rare events in our microscopic simulations, therefore 
requiring sufficient statistics. 

We stress that similar local sequences do not exhibit 
the same opening probabilities; equal size segments of 
relatively weakly bound A/T pairs in different parts of 
the promoter show very different statistics (compare for 
example the region around -30 with that around -|-1 in 
P5). Furthermore, the larger openings do not occur in 
regions with longer A/T stretches, as might be intuitively 
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-20 -10 +1 +11 

Base— pair sequence 



FIG. 3: Logarithm of the probability P for the occurrence 
of an opening of 10 base-pair width and amplitude of more 
than 2.1 A (thick solid line) or lA A (dotted line) starting at 
a particular site n of the DNA fragment, as a function of n, 
for the (a) P5 promoter, (b) AdMLP promoter, and (c) T7 
promoter. 



anticipated because of the weaker bonding. Effective long 
range cooperativity (from the nonlinear inter-site poten- 
tial in equation |^) and competing localization lengths 
due to the spatial disorder and the nonlinearity are re- 
sponsible for this high specificity: in general, leng th scale 
competition in nonlinear systems is known [iTj to lead 
to complex spatio-temporal (dynamic landscape) behav- 
ior. The sensitivity of the cooperative/competing phe- 
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-40 -30 -20 -10 +1 +11 +21 

Base— pair sequence 



FIG. 4: Logarithm of the probabihty P for the occurrence of 
an opening of 10 base-pair width and amphtude of more than 
2.1 A (thick sohd hne) or 1.4 A (dotted line) starting at a 
particular site n of the DNA fragment, as a function of n for 
the mutated P5 promoter (see text). 
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FIG. 5: Logarithm of the probabihty P for the occurrence of 
an opening of 10 base-pair width and amplitude of more than 
2.1 A (thick solid line) or 1.4 A (dotted line) starting at a 
particular site n of the DNA fragment, as a function of n for 
the P5 promoter, by linearizing the stacking interaction term 
of the potential (i.e. setting p = in potential energy iQ). 

nomena, affected even by a single base-pair modification, 
enhances the predictive power of our model; a small mu- 
tation of the sequence (at a specific location) is sufficient 
to completely eliminate bubble formation at the tran- 
scription initiation site. In Fig. 4 we show numerical 
calculations of the opening probabilities for a mutated 
P5 promoter, where nucleotides +2 and -1-3 have been 
changed from AT to GC. This mutation completely elimi- 
nates the opening at the previous transcription start site, 
in agreement with no transcriptional events occurring in 
the corresponding experiment (see Fig. 2a, lane 4). 

We emphasize that, as in previous applications of this 
model, the nonlinear inter-site coupling is crucial for its 
success For example, as can be seen in Fig. 5, lin- 

earizing the stacking interaction term (p = 0) results in 



very modified statistics for the openings of the P5 pro- 
moter, changes the position of the peaks along the se- 
quence, and eliminates the successful comparison with 
the experimental observations. The nonlinear inter-site 
coupling constitutes a minimal representation of the local 
stacking constraint between neighboring base-pairs. As 
in more general situations of displacive structural phase 
transitions ,4 1^] J such local constraints can lead to long- 
range "elastic" interactions and macroscopic cooperativ- 
ity. 

In summary, our model and simulations suggest that 
structurally specific coherent thermal fiuctuations iden- 
tify locations in the DNA sequences where the RNA poly- 
merase initiates transcription. Further, we find indica- 
tions that the thermal fluctuations also help in recruiting 
other protein complexes participating in the transcrip- 
tional process, by separating the DNA double strand at 
specific locations. These bubbles precede protein binding 
and their possible role is limited to the very initial steps 
of the transcription. This suggests that DNA, through 
structurally specific dynamics, participates in directing 
its own transcription. 
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